Policy Iteration Based on a Learned Transition Model

نویسندگان

Vivek Ramavajjala

Charles Elkan

چکیده

This paper investigates a reinforcement learning method that combines learning a model of the environment with least-squares policy iteration (LSPI). The LSPI algorithm learns a linear approximation of the optimal stateaction value function; the idea studied here is to let this value function depend on a learned estimate of the expected next state instead of directly on the current state and action. This approach makes it easier to define useful basis functions, and hence to learn a useful linear approximation of the value function. Experiments show that the new algorithm, called NSPI for next-state policy iteration, performs well on two standard benchmarks, the well-known mountain car and inverted pendulum swing-up tasks. More importantly, the NSPI algorithm performs well, and better than a specialized recent method, on a resource management task known as the day-ahead wind commitment problem. This latter task has action and state spaces that are high-dimensional and continuous.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Convergent Form of Approximate Policy Iteration

We study a new, model-free form of approximate policy iteration which uses Sarsa updates with linear state-action value function approximation for policy evaluation, and a “policy improvement operator” to generate a new policy based on the learned state-action values. We prove that if the policy improvement operator produces -soft policies and is Lipschitz continuous in the action values, with ...

متن کامل

Investigating the Effects of Financial Risks with Central Bank Policy Intervention and Foreign Exchange Market Pressure on the Stability of Banking sector Based on Gerton and Ruper Model: Nonlinear Smooth Transition Regression Approach

In the present study, in the first stage, using Gerton and Roper (1977) model, the central bank policy intervention index and foreign exchange market pressure were calculated. Then, using the STAR regression model, the nonlinear effects of financial risks with policy intervention of the central bank and the pressure of the foreign exchange market on the country's banking stability are examined....

متن کامل

Metacontrol for Adaptive Imagination-Based Optimization

Many machine learning systems are built to solve the hardest examples of a particular task, which often makes them large and expensive to run—especially with respect to the easier examples, which might require much less computation. For an agent with a limited computational budget, this “one-size-fits-all” approach may result in the agent wasting valuable computation on easy examples, while not...

متن کامل

Investigating transition pathways and transition failures and proposing policy solutions to cope with transition failures; Iran’s wind turbine industry

This paper aims to investigate transition paths and reasons of failure of Iran’s wind turbines industry in transition to sustainability successfully. In this regard, a qualitative approach and case study strategy have been used. For this purpose, based on type of national innovation system, the transition path and sustainability transition status of the wind turbines industry in selected countr...

متن کامل

Investigating transition pathways and transition failures and proposing policy solutions to cope with transition failures; Iran’s wind turbine industry

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

Policy Iteration Based on a Learned Transition Model

نویسندگان

چکیده

منابع مشابه

A Convergent Form of Approximate Policy Iteration

Investigating the Effects of Financial Risks with Central Bank Policy Intervention and Foreign Exchange Market Pressure on the Stability of Banking sector Based on Gerton and Ruper Model: Nonlinear Smooth Transition Regression Approach

Metacontrol for Adaptive Imagination-Based Optimization

Investigating transition pathways and transition failures and proposing policy solutions to cope with transition failures; Iran’s wind turbine industry

Investigating transition pathways and transition failures and proposing policy solutions to cope with transition failures; Iran’s wind turbine industry

عنوان ژورنال:

اشتراک گذاری